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Nicolae, Meng and Kong are to be congratulated 
on having treated an important practical problem 
in many scientific inquiries in which the investigator 
has chosen the testing procedure, but needs to know 
the impact of the missing data on the test in terms of 
the relative loss of information. To measure the rel- 
ative information, they propose to compare how the 
observed-data likelihood deviates from flatness rela- 
tive to the same deviation in the complete-data like- 
lihood. Several measures of this deviation expressed 
by Bayesian method are explored and applied to the 
study of genetics and genomics. As noted in their pa- 
per, these measures are especially needed in small- 
sample problems with incomplete data. 

We would like to explore the use of this type of 
measure in two examples to indicate its wide ap- 
plicability and some computational issues. One con- 
cerns infectious disease data, which are usually highly 
dependent and incomplete; the investigators often 
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need to decide if more data are needed, and in case 
they are, to know the type of data that is most de- 
sirable. The other concerns a test on the shape of a 
regression function; we will apply the Bayesian mea- 
sure of relative information to select design points 
for collecting more data. 

Because Bayesian tests are more tractable and 
natural than a frequentist approach in these two ex- 
amples, we consider the following extensions of their 
(25) for the measure of relative information: 
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Here Eq means average over from the conditional 
posterior distribution on the null hypothesis. To shorten 
the presentation, we use only (BI3) in the following 
discussion. 

1. INFECTIOUS DISEASE DATA 

As discussed in Rhodes, Halloran and Longini (1996), 
there are several levels of information in the study of 
infectious disease data and it is of interest to decide 
the level of information in the study. We consider 
two levels of information in a simple model to illus- 
trate the way that (BIS) may be used in this situa- 
tion. Suppose there is a collection of disjoint house- 
holds that suffer a transmissible disease and an indi- 
vidual can only be infected by members in the same 
household. We assume an S-I-R model; at any time 
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point, each individual is in one of the three states: 
susceptible (S), infectious (I) or removed (R); a sus- 
ceptible individual may become infectious and an 
infectious individual may become removed. Assume 
there are m people in one household. The transition 
of the health status of people in one household is de- 
scribed by the following counting process. We note 
that counting process modeling of infectious disease 
data is discussed in Becker (1989) and Andersson 
and Britton (2000), among others. 

For i = 1, . . . ,m, let Ni{t) be 1 if the ith individ- 
ual has been infected at time t and be if not; for 
i = m + 1, . . . , 2m, let Ni{t) be 1 if the (i — m)th 
individual has been removed at time t and be if 
not. Let I{t) denote the number of infectious peo- 
ple at time t. Here t > 0. Assume A^i(O) = 1, which 
means this individual is the first infected person. As- 
sume that P{Ni{t + h)- Ni{t)\Tt) = hXi{t) + o{h). 
Here Ai(t) = /3o exp(/3iZi)/(t-)(l - N,{t-)) for i = 
1, . . . ,m, and Xi{t) = 7o(iVi_m(t-) - iVi(t-)) for i = 
m + 1, . . . , 2m; Tt is the history up to time t. The 
parameters /3o and 70 are respectively called the in- 
fection rate and the removal rate. 

Assuming the covariate Zi has value or 1 , we are 
interested in testing the hypothesis Hq that /3i is less 
than 0. When Zi = l means that the ith individual 
has been vaccinated, f3i may represent the efficacy 
of the vaccine. 

We assume the removal times of all the removed 
individuals are observable and their infection times 
are not observable except the first one in the house- 
hold, which is assumed to be zero. We note that it is 
often easier to obtain removal times than infection 
times; the latter are often hard, if not impossible, 
to get; the sole purpose of assuming that the first 
infection time is observable is to simplify the pre- 
sentation. 

Suppose we have collected the observed data and 
decided to test the hypothesis Hq by considering 
the ratio of the posterior probability to the prior 
probability of the event [/3i < 0] . 

Viewing all the infection times except the first one 
in each household as missing data, we can use (BI3) 
to measure the fraction of missing information. Al- 
ternatively, we may consider the removal times of 
additional four, say, households as missing data and 
calculate its (BIS). These two (BI3)s might be useful 
in deciding, when additional data are needed, which 
type of additional data is more desirable. We illus- 
trate this method in the following simulation stud- 
ies. 



Assuming /3q = 1, /3i = —0.5, 70 = 1, there are 6 
members in each household and there are 20 house- 
holds, we generate a set of observed data; assuming 
the priors for /3o and 70 are exponentially distributed 
as Exp(l) and that for f3i is standard normal, we use 
MCMC to generate the posterior distributions of the 
parameters. 

The relative information (BIS) has values 0.795 
and 0.288, respectively, for the missing data being 
infection times and for that being additional four 
household removal times. This seems to suggest that 
obtaining additional four household removal times is 
more desirable for this set of observed data. By the 
way, the prior probability of [/3i < 0] is 0.5 and the 
posterior probability of < 0], given the removal 
times of the 20 households, is 0.739. Although we 
have treated only an oversimplified example, this 
simulation study seems to suggest that the relative 
information measure proposed by Nicolae, Meng and 
Kong (2008) is useful in infectious disease data anal- 
ysis. 

2. A TEST FOR MONOTONICITY OF A 
REGRESSION FUNCTION 

Let S denote the set of all continuous functions on 
[0, 1] and I denote the set of all nondecreasing con- 
tinuous functions on [0, 1] . Consider the regression 
model 

Yk = F{Xk) + aek, 

for some F in S. Here for k = 0, . . . ,K , Yk is a re- 
sponse variable, is a constant design point in 
[0,1], and the errors {efc} are assumed to be inde- 
pendent and standard normal; a is a positive con- 
stant. 

We are interested in testing the hypothesis Hq 
that the regression function F is nondecreasing and 
wish to know the way to collect more data properly. 
We will introduce a probability measure on 5, and 
consider a Bayesian approach. 

Let B = U^=i({n} X R"+i) and <^i,„(t) = C^t\l - 
i)"-* for t G [0, 1]. For 6„ = (6o,n, • • • , bn,n), we define 

note Fh^ is called a Bernstein polynomial with co- 
efficients 6o,rn • • ■ ) bn,n- It is readily seen that Fi,^ (•) 
is a member of S and it is a member of I, if bn G 

{bn\bQ,n < ■ ■ ■< bn,nY Let 5„ = {Fbjbn G ffi'^+H- It 

is clear that 5 D |J^^ 5„. A probability measure vr 
can be introduced on S as follows. Let 7r„ be a con- 
ditional density on W^^^ and p a probability mass 
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function on {1,2,...}; define 7r(n,6.„) = p(n)7r„(6„), 
wliich introduces a probability measure on (Jrjii({'^} 
M"""*"^). Identifying a Bernstein polynomial with its 
order and coefficients, we can regard vr as a proba- 
bility on UnLi'S'n, hence on S. Priors of this form 
are referred to as Bernstein priors. 

Chang et al. (2007) showed that suitably intro- 
duced Bernstein priors facilitate the estimation of 
F under various shape restrictions. In fact, this ap- 
proach also provides a direct assessment of the hy- 
pothesis Hq that F is in X by considering the ratio 
of the posterior probability to the prior probability 
of the set I. We note that the Bernstein priors used 
in Chang et al. (2005) and Chang et al. (2007) have 
large supports and, yet, take into consideration the 
shape restrictions, and the prior on S that we use in 
the following simulation is constructed in the spirit 
of these references and motivated by the simple ob- 
servation that if bi^n is in [ti, r2] for every i, then Ff,^ 
is in [ri,r2], and a continuous function with values 
in [ri,T2] can be approximated by Bernstein poly- 
nomials with coefficients contained in [Ti,r2]. 

Suppose we have collected response variables at 
Xq , . . . , Xk and would like to know the relative 
information of the observed data when more 
response variables are taken at additional design 
points Xq, . . . ,xl. The following simulation studies 
are meant to illustrate the use of (BIS) in this prob- 
lem. Assume F{t) = 0.6t for t in [0, 1] and a = 0.4. 
Let K = 9 and Xk = k/9 for fc = 0, . . . , 9. We gener- 
ate one set of data according to this specification, 
and then calculate (BIS) under several missing data 
scenarios. When L = K and xq = Xq, . . . ,xl = Xi, 
we find (BIS) is equal to 0.1S9. When (0,xo, . ■ . ,X4, 
0.5) form an equal length partition of the interval 
[0,0.5] and (0.5, 2:5, . . . , xg, 1) form an equal length 
partition of the interval [0.5,1], we find (BIS) is 
equal 0.S46. This shows that the former design points 
would be preferable to the latter when additional 
data are needed. 

To have some idea for the case L = 2K, we find 
(BIS) is 0.052 if X2k = a^2fc+i = for k = 0, . . . , K , 
and is 0.217 if (0, xq, • . . , xg, 0.5) form an equal length 
partition of the interval [0,0.5] and (0.5, xio, • • • , 
xig, 1) form an equal length partition of interval 
[0.5,1]. We note that the prior probability of I is 
0.0006 and the posterior probability of I is 0.0015. 
In summary, we find the measure of relative infor- 
mation (BIS) useful in selecting extra design points 
for data collection in this regression example. 



3. SOME COMPUTATIONAL REMARKS 

Nicolae, Meng and Kong (2008) pointed out that (24) 
may be problematic because of the large variability 
in the likelihood ratios. That this problem does ap- 
pear in the above two examples is the sole reason 
that only extensions of (25) are used here. 

Because we work with Bayesian tests, in which 
there are already specified priors, it seems natural to 
use the corresponding posteriors in the calculation 
of (24) and (25) and their extensions like (BIS) and 
(BI4). In particular, the Eq in (BIS) and (BI4) is the 
conditional posterior probability on the null hypoth- 
esis. It may happen that the (unconditional) poste- 
rior probability of the null hypothesis is so small that 
sampling from the conditional posterior probabil- 
ity needs large computation time, which may make 
the calculation of (BIS) hard. In this connection, we 
would like to note that although the posterior prob- 
ability for the above regression problem is somewhat 
small, it is still manageable. 
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